Fast semi-automatic semantic annotation for spoken dialog systems

نویسندگان

  • Ruhi Sarikaya
  • Yuqing Gao
  • Paola Virga
چکیده

This paper describes a bootstrapping methodology for semi– automatic semantic annotation of a “mini–corpus” that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose to cast the problem of semantic annotation as a classification problem: each word is assigned a unique set of semantic tag(s) and/or label(s) from the universal tag/label set. This approach enables “local” semantic annotation resulting in partially annotated sentences. The proposed method reduces the annotation time and cost that forms a major bottleneck in the development of NLU systems. We present a set of experiments conducted on the medical domain “mini– corpus” that contains 10K hand–annotated sentences. Three annotation methods are compared: parser (baseline), similarity and classification–based annotations. The support vector machine (SVM) based classification scheme is shown to outperform both similarity and parsed–based annotation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Portability of Semantic Annotations for Fast Development of Dialogue Corpora

Generalization of spoken dialogue systems increases the need for fast development of spoken language understanding modules for semantic tagging of speaker’s turns. Statistical methods are performing well for this task but require large corpora to be trained. Collecting such corpora is expensive in time and human expertise. In this paper we propose a semi-automatic annotation process for fast pr...

متن کامل

Approche bayésienne de la composition sémantique dans les systèmes de dialogue oral

Focusing on the interpretation component of spoken dialog systems, this paper introduces a stochastic approach based on dynamic Bayesian networks to infer and compose semantic structures from speech. Word strings, basic concept sequences and composed semantic frames (as defined in the Berkeley FrameNet paradigm) are derived sequentially from the users’ inputs. A semi-automatic process provides ...

متن کامل

D3 Toolkit: A Development Toolkit for Daydreaming Spoken Dialog Systems

Recently various data-driven spoken language technologies have been applied to spoken dialog system development. However, high cost of maintaining the spoken dialog systems is one of the biggest challenges. In addition, a fixed corpus collected by human is never enough to cover diverse real user’s utterances. The concept of a daydreaming dialog system can solve the problem by making the system ...

متن کامل

Annotating Spoken Dialogs: From Speech Segments to Dialog Acts and Frame Semantics

We are interested in extracting semantic structures from spoken utterances generated within conversational systems. Current Spoken Language Understanding systems rely either on hand-written semantic grammars or on flat attribute-value sequence labeling. While the former approach is known to be limited in coverage and robustness, the latter lacks detailed relations amongst attribute-value pairs....

متن کامل

Semi-supervised Learning for Spoken Language Understanding Using Semantic Role Labeling

In a goal-oriented spoken dialog system, the major aim of language understanding is to classify utterances into one or more of the pre-defined intents and extract the associated named entities. Typically, the intents are designed by a human expert according to the application domain. Furthermore, these systems are trained using large amounts of data manually labeled using an already prepared la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004